License

These files are distributed under the Creative Commons Attribution 2.5 South Africa license. 

All files are distributed under the same conditions.
_______________________________________________
License: Creative Commons Attribution 2.5 South Africa
URL: http://creativecommons.org/licenses/by/2.5/za/

Attribute work to: South African Department of Arts and Culture & Centre for Text Technology (CTexT, North-West University, South Africa)

Attribute work to URL: http://www.nwu.ac.za/ctext 
______________________________________________

Required files and tree structure:
	./
	|_ salid(.exe)
	|_ settings.ini
	|_ ./models
		|_ af-6.dat
		|_ en-6.dat
		|_ nr-6.dat
		|_ nso-6.dat
		|_ ss-6.dat
		|_ st-6.dat
		|_ tn-6.dat
		|_ ts-6.dat
		|_ ve-6.dat
		|_ xh-6.dat
		|_ zu-6.dat

Usage:
	1. Open command prompt, navigate to directory of salid(.exe)
	2. Quick look:
		salid <id|train|server> -h for additional options		
		salid -v <id|train|server> -h for verbose output (optional)
		salid -s <id|train|server> -h to suppress output to minimum (optional)
		salid <id|train> -ver for version number
		
	3. To run the id option:
		salid id -i "input file | input directory | input phrase" 
			to identify a sample file or phrase, must be enclosed in quotes " 
		salid id -i "input file | input directory | input phrase" -o "output filename | directory"
			to identify a sample file or phrase, result printed/copied in "output filename" or "output directory"
		salid id -t
			to enter the input tool
		salid id -i "input file | input directory | input phrase" -o "output filename | directory" -l
			to run the identifier at line level. Omiting the -l option flag will identify the input at document level.
		salid id -i "input file | input directory | input phrase" -o "output filename | directory" -b 80
			to identify the input at a benchmark(confidence) percentage of x. Value between 0 - 100 only.
		Examples:
			salid id -i "hello world"
			salid id -i sample.txt
			salid id -i C:\samples
			salid id -i sample.txt -o result.txt
			salid id -i C:\samples -o C:\results
			salid id -t
			
	4. To run the train option:
		salid train -f "filename" -n 6 -l "Language name" -o "data filename" -q 5 -c
			to train a new language model,
			must be included:
				-f training corpus filename
				-n NGram weight
				-l language name
				-o .dat filename
				-q remove frequency lower than X
			optional:
				-c clean punctuation flag
		Examples:
			salid train -f CORPUS.txt -n 6 -l isiNdebele -o nr-6 -q 5 -c
			salid train -f FILE.txt -n 3 -l Afrikaans -o afDAT -q 10
	
	5. To run the network option(*):
		salid server
			starts the app, loads the models and listens for clients on default port (7770)
		* Server listens for connections through INET simple sockets.
		* Data is expected in JSON style format with two properties (text, benchmark)
			Example: { text:"hello world", benchmark:80 }
			Example: { benchmark:0, text:"Ke a leboga rra" }
		* The JSON string should be in UTF-8 and sent through a byte stream
		* The resulting byte stream must be converted back to UTF-8 string. Result in JSON format.
			Example: { language:"Afrikaans", confidence:0.183551226 } 
			Example: { language:"Xitsonga", confidence:0.983545621 } 
